The point of departure for most work on word sense disambiguation is the multiple lexical entry view of the lexicon as introduced above: a lexical item is associated with discrete senses identified in advance, and the job of the disambiguation module is to select one of these senses as the meaning intended by the use of a particular word in a particular context. This approach is therefore subject to the criticisms of inadequacy put forth above, in that it ignores potential contextual influences on the precise sense a use of a word has -- context can only influence the selection of a sense, not the determination of a sense.
wilks:75 relies on selectional restrictions to drive sense disambiguation -- different senses of nouns are specified as having different semantic features and different senses of verbs require arguments with specific semantic features. These features and requirements must be encoded in the lexicon. During processing, the selectional restrictions imposed by the verbs in a sentence interact with the semantic features associated with the nouns in the sentence to identify the intended sense of each word. As pointed out by kilgarriff:92, this approach is limited because it does not consider syntax in the disambiguation process, nor is semantic context beyond verb-argument relations taken into consideration. Furthermore, selectional restrictions can be overridden in sufficiently rich discourse contexts.
In direct contrast to Wilks' single-pronged approach to disambiguation is the system developed by hirst:87, which allows for the interaction of cues from selectional restrictions, syntax, semantic relations, and the linguistic context to select a word sense. Hirst utilises a marker-passing technique. Markers are passed from lexical entries activated by the sentence to related nodes in the knowledge base, in the spirit of spreading activation in a semantic network. Convergences between specific senses of different words in a sentence are taken to be indicative of selection of those senses in that sentence. This approach follows data from the psycholinguistic literature (e.g. semantic priming experiments) which indicate that the occurrence of a word initially results in the activation of all of its senses before the context establishes the intended sense. Again, this approach assumes that the possible senses for a word are established in advance and can be discriminated in context.
mcroy:92 develops further the idea that word sense disambiguation must involve information from many sources. She depends on a lexicon in which coarse distinctions among senses are encoded (in addition to syntactic information), and which is linked to a conceptual hierarchy and information about word frequency and collocations derived from a corpus. The effect of semantic context is modeled via sets of senses, or clusters, which group together concepts/senses sharing some concept. There are three kinds of clusters: categorial clusters which are sets of senses which share a conceptual parent in the hierarchy, functional clusters which share a specified functional relationship to some entity (e.g. a part-whole relationship), and situational clusters which groups together senses which tend to occur together in a common setting or event. These clusters are defined in advance in the lexicon of the system. They serve essentially the same purpose as Hirst's semantic network; a cluster containing a sense of a word will be active if it ``contains any of the senses under consideration for other words in the current paragraph'' (McRoy 1992:20). The process of word sense disambiguation corresponds to sense selection based on context. Senses of all the words in a sentence which are preferred after a preliminary processing phase integrating sense indications from morphology, syntax, and clusters are fed into a parser and then a semantic interpreter which establish the precise relations between the words in the sentence (where these relations come from the clusters rather than through modulation) and leads to final sense selection.
Another mechanism which could be integrated with the McRoy-style
approach has been proposed by agirre_rigau:95, using the
taxonomy provided by WordNet (Miller 1990)
as the basis for a measure of conceptual
distance of words in a window of context around the word to be
disambiguated.
The sense selected is the
sense (predefined) in the subhierarchy of WordNet for which the
highest conceptual density is calculated. This is a more formulaic,
probabilistic approach to determining contextually-based sense
preferences, but it is similar in spirit to the cluster approach in
use by McRoy. Other algorithmic approaches to word sense
disambiguation using WordNet and similarity measures are reported in
Szpakowicz (1997) and resnik:95.
Each of the above approaches uses explicitly encoded semantic relationships associated with particular pre-established senses in order to disambiguate a word. More recently, methodologies have been explored which allow a system to learn how to disambiguate a word (e.g. Cottrell 1989). For example, veronis_ide:95 construct a neural network on the basis of a machine readable dictionary -- input nodes correspond to words, output nodes correspond to senses (words tagged with a sense number as found in the dictionary), and the network is trained on the texts in the dictionary definitions. In this way, the network acquires a representation of the significant semantic relations among senses of words, in contrast to systems for which these semantic relations must be represented in symbolic terms (usually through a conceptual hierarchy). However, these approaches depend on a very large initial training set to establish reliable relationships between words. Since a given sense of a word may appear in varous kinds of contexts, the training set must incorporate a wide range of contexts in order to develop a satisfactory predictive model. The development of such a training set would be very time-consuming.
Statistical techniques are currently also being exploited to solve the disambiguation problem (e.g. Yarowski 1995, Fujii 1996, Pedersen 1997). These approaches generally use features of the surrounding words, such as surface form, part of speech, and morphology, as the basis for classification of different senses of an ambiguous word. A set of disambiguated example sentences, usually derived from a large corpus, drives the development of a classification algorithm. In some cases (e.g. Fujii 1996) the features used for classification also include a similarity value of the context words to the ambiguous word, estimated on the basis of relationships drawn from a thesaurus. Guthrie (1991) use the definitions in the machine readable version of the Longman Dictionary of Contemporary English (LDOCE) to create word sense neighbourhoods, divided according to the subject codes represented in the MRD, which then are used for a probabilistic account of word sense disambiguation. Similarly, Wilks and Stevenson wilks_stevenson:97 utilise the senses discriminated in LDOCE to disambiguate texts, combining part of speech information and a measure of the overlap between the dictionary definition for various senses and the textual context to achieve 86% accuracy in assigning words in their small sample text to the correct homograph (and 57% accuracy in the sense assignment). This result shows that an approach to word sense disambiguation which combines information sources and techniques is promising for disambiguation to pre-encoded senses. The approaches outlined here in some cases depend on a large corpus which needs to be disambiguated in advance and therefore suffer from the same problems as the neural net work approaches. Other approaches proceed by analysing the raw (undisambiguated) corpus, allowing the model to create its own sense distinctions, which might result in ad-hoc senses for a word. The approaches which rely on the lexica in machine readable dictionaries or therauruses allow sense disambiguation to proceed relative to a well-defined set of senses, but these lexica are expensive to create (Wilks and Stevenson wilks_stevenson:97). Ultimately a hybrid semi-interactive approach, combining raw corpus analysis with theoretically-motivated ``tweaking'' (possibly with respect to an MRD) of the resulting model may prove to give the best results.
A review of the various learning techniques as applied to the sense disambiguation problem appears in mooney:96, to which I refer the interested reader. I will not go into further detail here; the general discussion here is enough to support the conclusion that extant automatic approaches to word sense disambiguation depend on pre-encoded distinctions between senses and do not consider the influence of context on determining the precise sense of a word. Although these approaches are adequate for certain NLP tasks which can function solely on the basis of coarse meaning discrimination, they do not suffice as useful techniques under a generative, contextually modulated view of lexical semantics and more general NLU/NLG tasks.